---
id: qa-agent-choosing-metrics
title: Choosing the right Metrics for your QA Agent
sidebar_label: Choosing your Metrics
---

To choose the right metrics, we'll need to revisit our evaluation criteria. In the previous section, we observed a few responses from our QA agent and established that all responses should be:

1. **Relevant** to the user query
2. **Non-speculative** (it shouldn't fabricate information when asked questions that require details not present in the knowledge base).

:::tip
Having clear evaluation criteria helps you easy identify **specific evaluation metrics** that are relevant to your values and use case.
:::

## Choosing your Metrics

Our first criterion requires that the QA agent's answers be relevant. This makes the `AnswerRelevancy` metric a straightforward choice, as it directly measures the answer relevance with respect to the user input, and is readily available in `DeepEval`.

```python
from deepeval.metrics import AnswerRelevancyMetric
```

:::info
`AnswerRelevancy` and `Faithfulness` are RAG metrics specifically designed for evaluating RAG systems. If you're not familiar with RAG metrics, this [comprehensive guide](/guides/guides-rag-evaluation) is a must-read—especially if you're building QA agents.
:::

To ensure that our answers are non-speculative, we'll need to ensure that the QA agent only include information from our knowledge base. Fortunately, `Faithfulness` measures whether the generated output factually aligns with the information in the retrieved context, which also prevents our LLMs from producing speculative information.

```python
from deepeval.metrics import FaithfulnessMetric
```

It's important to note that while your evaluation criteria align with RAG metrics, this is not always the case—even for QA agents. For example, as mentioned earlier, although not a priority, answers should sound more human. In such a case, defining a custom `GEval` metric for Humanness might take precedence if it is prioritized over answer relevancy or faithfulness.

:::note
G-Eval is a metric in Deepeval that allows you to create any custom metric using a user-provided criteria. Learn more about `GEval` here.
:::

## Defining these Metrics

In deepeval defining metric is as easy as importing them and initializing them:

```python
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric

answer_relevancy_metric = AnswerRelevancyMetric()
faithfulness_metric = FaithfulnessMetric()
```

:::tip
DeepEval offers 20+ metrics out of the box. You can learn more about them [here](#).
:::

With our metrics defined, and evaluation dataset pushed to Confident AI, we're ready to begin running evaluations.
